An efficient method to estimate pronunciation from multiple utterances
نویسندگان
چکیده
Given K utterances of a word and a set of sub-word units one may need a generalization of the conventional one-dimensional Viterbi algorithm to jointly decode them in order to derive their underlying word model (pronunciation). This extension is called k-dimensional Viterbi. However, as the number of utterances increases, the complexity of the k-dimensional Viterbi algorithm exponentially increases causing prohibitive computational burden. Here, we propose an approximation algorithm for the k-dimensional Viterbi which efficiently uses the available utterances to estimate the pronunciation. In addition to automatic dictionary generation, it can be used in computationally expensive applications such as lexicon-free training and joint pattern alignment.
منابع مشابه
Breadth-first search for finding the optimal phonetic transcription from multiple utterances
Extending the vocabulary of a large vocabulary speech recognition system usually requires phonetic transcriptions for all words to be known. With automatic phonetic baseform determination acoustic samples of the words in question can substitute for the required expert knowledge. In this paper we follow a probabilitistic approach to this problem and present a novel breadth-first search algorithm...
متن کاملItalian speakers learn lexical stress of German morphologically complex words
Italian speakers tend to stress the second component of German morphologically complex words such as compounds and prefix verbs even if the first component is lexically stressed. To improve their prosodic phrasing an automatic pronunciation teaching method was developed based on auditory feedback of prosodically corrected utterances in the learners’ own voices. Basically, the method copies cont...
متن کاملConsiderations on vowel durations for Japanese CALL system
Due to various difficulties in pronunciation, utterances by nonnative speakers may be lacking in fluency. The Japanese pronunciation is said to have mora-synchronism, and, therefore, we assume that the disfluency may cause larger variations in vowel durations. Analyses of vowel (and CV) durations were conducted for Japanese sentence utterances by 2 non-Japanese speakers and one Japanese speaker...
متن کاملFeature-based Pronunciation Modeling for Speech Recognition
We present an approach to pronunciation modeling in which the evolution of multiple linguistic feature streams is explicitly represented. This differs from phone-based models in that pronunciation variation is viewed as the result of feature asynchrony and changes in feature values, rather than phone substitutions, insertions, and deletions. We have implemented a flexible feature-based pronunci...
متن کاملPrediction of American listeners’ misrecognition of English words spoken by Japanese
This study tries to automatically estimate the probability of individual spoken words of Japanese English (JE) being perceived correctly by American listeners and to clarify what kind of combinations of segmental, prosodic, and/or linguistic errors are more fatal to the correct recognition. Firstly, from a large speech database of JE, a balanced set of 360 utterances of 90 male speakers were se...
متن کامل